Sentence Boundary Detection in Turkish

نویسندگان

  • Bekir Taner Dinçer
  • Bahar Karaoglan
چکیده

In this paper, we describe a solution method for sentence boundary detection in Turkish. The method exploits simple heuristic knowledge of Turkish syllabication and its phonetic rules for disambiguation of dots. The test accuracy of the algorithm is measured as 96.02%. The main contribution of this study is considered as presenting a new lexicon free method for differentiating EOS (end of sentence) dots from the ones that are used for other purposes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Infrastructure for Turkish Prosody Generation in Text-to-Speech Synthesis

Text-to-speech engines benefit from natural language processing while generating the appropriate prosody. In this study, we investigate the natural language processing infrastructure for Turkish prosody generation in three steps as pronunciation disambiguation, phonological phrase detection and intonation level assignment. We focus on phrase boundary detection and intonation assignment. We prop...

متن کامل

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Rule-Based Sentence Detection Method (RBSDM) for Turkish

The first process of generating a corpus, which is a representative of the language, is the determination of sentences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and mach...

متن کامل

TAG Analysis of Turkish Long Distance Dependencies

All permutations of a two level embedding sentence in Turkish is analyzed, in order to develop an LTAG grammar that can account for Turkish long distance dependencies. The fact that Turkish allows only long distance topicalization and extraposition is shown to be connected to a condition-the coherence condition-that draws the boundary between the acceptable and inacceptable permutations of the ...

متن کامل

Speech Communication Session 4pSCb: Production and Perception I: Beyond the Speech Segment (Poster Session) 4pSCb49. Towards a model of intonational phonology of Turkish: Neutral intonation

This study proposes an Autosegmental-Metrical model of Turkish intonation based on sentences produced in neutral focus, as part of our ongoing research investigating Turkish intonational phonology. Tonal patterns of utterances were examined by varying the length of a word and a phrase, the location of stress, syntactic structures, and sentence types. Preliminary results suggest that Turkish has...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004